Rank | Count | Beginning |
---|---|---|
23652 | 7142 | El |
50737 | 5317 | La |
31745 | 4811 | En |
91397 | 2941 | Trabajo |
59140 | 2176 | Los |
73288 | 1795 | Por |
78337 | 1407 | Reciba |
69392 | 1405 | Para |
55540 | 1378 | Las |
82090 | 1351 | Se |
37728 | 1299 | Es |
18224 | 1261 | De |
98151 | 1200 | Y |
65170 | 1199 | No |
983 | 1107 | A |
41094 | 970 | Este |
84854 | 935 | Si |
14464 | 927 | Con |
71409 | 861 | Pero |
39609 | 850 | Está |
85721 | 744 | Sin |
88856 | 716 | También |
3980 | 657 | Al |
67602 | 627 | Oferta |
1721 | 592 | Además, |
95009 | 574 | Una |
82724 | 569 | Según |
94998 | 555 | Un |
13540 | 503 | Cómo |
58565 | 477 | Lo |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV